Frequency Ratio: a method for dealing with missing values within nearest neighbour search

نویسندگان

  • Rosanne Janssen
  • Pieter Spronck
  • Pauline Dibbets
  • Arnoud Arntz
چکیده

In this paper we introduce the Frequency Ratio (FR) method for dealing with missing values within nearest neighbour search. We test the FR method on known medical datasets from the UCI machine learning repository. We compare the accuracy of the FR method with five commonly used methods (three “imputation” and two “bypassing” methods) for dealing with values that are “missing completely at random” (MCAR) for the purpose of classification. We discovered that in most cases, the FR method outperforms the other methods. We conclude that the FR method is a strong addition to the commonly used methods for dealing with missing values within the nearest neighbour method.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

An Analysis of Four Missing Data Treatment Methods for Supervised Learning

One relevant problem in data quality is the presence of missing data. Despite the frequent occurrence and the relevance of missing data problem, many Machine Learning algorithms handle missing data in a rather naive way. However, missing data treatment should be carefully thought, otherwise bias might be introduced into the knowledge induced. In this work we analyse the use of the k-nearest nei...

متن کامل

A Study of K-Nearest Neighbour as an Imputation Method

Data quality is a major concern in Machine Learning and other correlated areas such as Knowledge Discovery from Databases (KDD). As most Machine Learning algorithms induce knowledge strictly from data, the quality of the knowledge extracted is largely determined by the quality of the underlying data. One relevant problem in data quality is the presence of missing data. Despite the frequent occu...

متن کامل

P. Jönsson and C. Wohlin, "benchmarking K-nearest Neighbour Imputation with Homogeneous Likert Data", Empirical Software Engineering: an Benchmarking K-nearest Neighbour Imputation with Homogeneous Likert Data

Missing data are common in surveys regardless of research field, undermining statistical analyses and biasing results. One solution is to use an imputation method, which recovers missing data by estimating replacement values. Previously, we have evaluated the hot-deck k-Nearest Neighbour (kNN) method with Likert data in a software engineering context. In this paper, we extend the evaluation by ...

متن کامل

Fractal Approximate Nearest Neighbour Search in Log-Log Time

Nearest neighbour searches in the image plane are among the most frequent problems in a variety of computer vision and image processing tasks. They can be used to replace missing values in image filtering, or to group close objects in image segmentation, or to access neighbouring points of interest in feature extraction. In particular, we address two nearest neighbour problems: The nearest neig...

متن کامل

Comparison of imputation methods for missing laboratory data in medicine

OBJECTIVES Missing laboratory data is a common issue, but the optimal method of imputation of missing values has not been determined. The aims of our study were to compare the accuracy of four imputation methods for missing completely at random laboratory data and to compare the effect of the imputed values on the accuracy of two clinical predictive models. DESIGN Retrospective cohort analysi...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2017